45 research outputs found
Multi-scale Deep Learning Architectures for Person Re-identification
Person Re-identification (re-id) aims to match people across non-overlapping
camera views in a public space. It is a challenging problem because many people
captured in surveillance videos wear similar clothes. Consequently, the
differences in their appearance are often subtle and only detectable at the
right location and scales. Existing re-id models, particularly the recently
proposed deep learning based ones match people at a single scale. In contrast,
in this paper, a novel multi-scale deep learning model is proposed. Our model
is able to learn deep discriminative feature representations at different
scales and automatically determine the most suitable scales for matching. The
importance of different spatial locations for extracting discriminative
features is also learned explicitly. Experiments are carried out to demonstrate
that the proposed model outperforms the state-of-the art on a number of
benchmarksComment: 9 pages, 3 figures, accepted by ICCV 201
Rethinking Person Re-identification from a Projection-on-Prototypes Perspective
Person Re-IDentification (Re-ID) as a retrieval task, has achieved tremendous
development over the past decade. Existing state-of-the-art methods follow an
analogous framework to first extract features from the input images and then
categorize them with a classifier. However, since there is no identity overlap
between training and testing sets, the classifier is often discarded during
inference. Only the extracted features are used for person retrieval via
distance metrics. In this paper, we rethink the role of the classifier in
person Re-ID, and advocate a new perspective to conceive the classifier as a
projection from image features to class prototypes. These prototypes are
exactly the learned parameters of the classifier. In this light, we describe
the identity of input images as similarities to all prototypes, which are then
utilized as more discriminative features to perform person Re-ID. We thereby
propose a new baseline ProNet, which innovatively reserves the function of the
classifier at the inference stage. To facilitate the learning of class
prototypes, both triplet loss and identity classification loss are applied to
features that undergo the projection by the classifier. An improved version of
ProNet++ is presented by further incorporating multi-granularity designs.
Experiments on four benchmarks demonstrate that our proposed ProNet is simple
yet effective, and significantly beats previous baselines. ProNet++ also
achieves competitive or even better results than transformer-based competitors
Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification
Cloth-changing person Re-IDentification (Re-ID) is a particularly challenging
task, suffering from two limitations of inferior identity-relevant features and
limited training samples. Existing methods mainly leverage auxiliary
information to facilitate discriminative feature learning, including
soft-biometrics features of shapes and gaits, and additional labels of
clothing. However, these information may be unavailable in real-world
applications. In this paper, we propose a novel FIne-grained Representation and
Recomposition (FIRe) framework to tackle both limitations without any
auxiliary information. Specifically, we first design a Fine-grained Feature
Mining (FFM) module to separately cluster images of each person. Images with
similar so-called fine-grained attributes (e.g., clothes and viewpoints) are
encouraged to cluster together. An attribute-aware classification loss is
introduced to perform fine-grained learning based on cluster labels, which are
not shared among different people, promoting the model to learn
identity-relevant features. Furthermore, by taking full advantage of the
clustered fine-grained attributes, we present a Fine-grained Attribute
Recomposition (FAR) module to recompose image features with different
attributes in the latent space. It can significantly enhance representations
for robust feature learning. Extensive experiments demonstrate that FIRe
can achieve state-of-the-art performance on five widely-used cloth-changing
person Re-ID benchmarks
Pushing the Limits of 3D Shape Generation at Scale
We present a significant breakthrough in 3D shape generation by scaling it to
unprecedented dimensions. Through the adaptation of the Auto-Regressive model
and the utilization of large language models, we have developed a remarkable
model with an astounding 3.6 billion trainable parameters, establishing it as
the largest 3D shape generation model to date, named Argus-3D. Our approach
addresses the limitations of existing methods by enhancing the quality and
diversity of generated 3D shapes. To tackle the challenges of high-resolution
3D shape generation, our model incorporates tri-plane features as latent
representations, effectively reducing computational complexity. Additionally,
we introduce a discrete codebook for efficient quantization of these
representations. Leveraging the power of transformers, we enable multi-modal
conditional generation, facilitating the production of diverse and visually
impressive 3D shapes. To train our expansive model, we leverage an ensemble of
publicly-available 3D datasets, consisting of a comprehensive collection of
approximately 900,000 objects from renowned repositories such as ModelNet40,
ShapeNet, Pix3D, 3D-Future, and Objaverse. This diverse dataset empowers our
model to learn from a wide range of object variations, bolstering its ability
to generate high-quality and diverse 3D shapes. Extensive experimentation
demonstrate the remarkable efficacy of our approach in significantly improving
the visual quality of generated 3D shapes. By pushing the boundaries of 3D
generation, introducing novel methods for latent representation learning, and
harnessing the power of transformers for multi-modal conditional generation,
our contributions pave the way for substantial advancements in the field. Our
work unlocks new possibilities for applications in gaming, virtual reality,
product design, and other domains that demand high-quality and diverse 3D
objects.Comment: Project page: https://argus-3d.github.io